This paper proposes a new family of algorithms for training neural networks(NNs). These are based on recent developments in the field of non-convexoptimization, going under the general name of successive convex approximation(SCA) techniques. The basic idea is to iteratively replace the original(non-convex, highly dimensional) learning problem with a sequence of (stronglyconvex) approximations, which are both accurate and simple to optimize.Differently from similar ideas (e.g., quasi-Newton algorithms), theapproximations can be constructed using only first-order information of theneural network function, in a stochastic fashion, while exploiting the overallstructure of the learning problem for a faster convergence. We discuss severaluse cases, based on different choices for the loss function (e.g., squared lossand cross-entropy loss), and for the regularization of the NN's weights. Weexperiment on several medium-sized benchmark problems, and on a large-scaledataset involving simulated physical data. The results show how the algorithmoutperforms state-of-the-art techniques, providing faster convergence to abetter minimum. Additionally, we show how the algorithm can be easilyparallelized over multiple computational units without hindering itsperformance. In particular, each computational unit can optimize a tailoredsurrogate function defined on a randomly assigned subset of the inputvariables, whose dimension can be selected depending entirely on the availablecomputational power.
展开▼